Data Visualization

Tangential Wasserstein Projections

Updated: 2024-06-28 16:36:08

We develop a notion of projections between sets of probability measures using the geometric properties of the $2$-Wasserstein space. In contrast to existing methods, it is designed for multivariate probability measures that need not be regular, and is computationally efficient to implement via regression. The idea is to work on tangent cones of the Wasserstein space using generalized geodesics. Its structure and computational properties make the method applicable in a variety of settings where probability measures need not be regular, from causal inference to the analysis of object data. An application to estimating causal effects yields a generalization of the synthetic controls method for systems with general heterogeneity described via multivariate probability measures.

Differentially private methods for managing model uncertainty in linear regression

Updated: 2024-06-28 16:36:08

In this article, we propose differentially private methods for hypothesis testing, model averaging, and model selection for normal linear models. We propose Bayesian methods based on mixtures of $g$-priors and non-Bayesian methods based on likelihood-ratio statistics and information criteria. The procedures are asymptotically consistent and straightforward to implement with existing software. We focus on practical issues such as adjusting critical values so that hypothesis tests have adequate type I error rates and quantifying the uncertainty introduced by the privacy-ensuring mechanisms.

Data Summarization via Bilevel Optimization

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Data Summarization via Bilevel Optimization Zalán Borsos , Mojmír Mutný , Marco Tagliasacchi , Andreas Krause 25(73 1 53, 2024. Abstract The increasing availability of massive data sets poses various challenges for machine learning . Prominent among these is learning models under hardware or human resource constraints . In such resource-constrained settings , a simple yet powerful approach is operating on small subsets of the data . Coresets are weighted subsets of the data that provide approximation guarantees for the optimization objective . However , existing coreset constructions are highly

Pareto Smoothed Importance Sampling

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Pareto Smoothed Importance Sampling Aki Vehtari , Daniel Simpson , Andrew Gelman , Yuling Yao , Jonah Gabry 25(72 1 58, 2024. Abstract Importance weighting is a general way to adjust Monte Carlo integration to account for draws from the wrong distribution , but the resulting estimate can be highly variable when the importance ratios have a heavy right tail . This routinely occurs when there are aspects of the target distribution that are not well captured by the approximating distribution , in which case more stable estimates can be obtained by modifying extreme importance ratios . We present a new

Optimal Locally Private Nonparametric Classification with Public Data

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Optimal Locally Private Nonparametric Classification with Public Data Yuheng Ma , Hanfang Yang 25(167 1 62, 2024. Abstract In this work , we investigate the problem of public data assisted non-interactive Local Differentially Private LDP learning with a focus on non-parametric classification . Under the posterior drift assumption , we for the first time derive the mini-max optimal convergence rate with LDP constraint . Then , we present a novel approach , the locally differentially private classification tree , which attains the mini-max optimal convergence rate . Furthermore , we design a

Topological Node2vec: Enhanced Graph Embedding via Persistent Homology

Updated: 2024-06-28 16:36:08

: Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Topological Node2vec : Enhanced Graph Embedding via Persistent Homology Yasuaki Hiraoka , Yusuke Imoto , Théo Lacombe , Killian Meehan , Toshiaki Yachimura 25(134 1 26, 2024. Abstract Node2vec is a graph embedding method that learns a vector representation for each node of a weighted graph while seeking to preserve relative proximity and global structure . Numerical experiments suggest Node2vec struggles to recreate the topology of the input graph . To resolve this we introduce a topological loss term to be added to the training loss of Node2vec which tries to align the persistence diagram PD of

Learning to Warm-Start Fixed-Point Optimization Algorithms

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Learning to Warm-Start Fixed-Point Optimization Algorithms Rajiv Sambharya , Georgina Hall , Brandon Amos , Bartolomeo Stellato 25(166 1 46, 2024. Abstract We introduce a machine-learning framework to warm-start fixed-point optimization algorithms . Our architecture consists of a neural network mapping problem parameters to warm starts , followed by a predefined number of fixed-point iterations . We propose two loss functions designed to either minimize the fixed-point residual or the distance to a ground truth solution . In this way , the neural network predicts warm starts with the end-to-end goal

Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length Katerina Hlaváčková-Schindler , Anna Melnykova , Irene Tubikanec 25(133 1 26, 2024. Abstract Multivariate Hawkes processes MHPs are versatile probabilistic tools used to model various real-life phenomena : earthquakes , operations on stock markets , neuronal activity , virus propagation and many others . In this paper , we focus on MHPs with exponential decay kernels and estimate connectivity graphs , which represent the Granger causal relations between their components . We approach this inference problem by

A General Framework for the Analysis of Kernel-based Tests

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us A General Framework for the Analysis of Kernel-based Tests Tamara Fernández , Nicolás Rivera 25(95 1 40, 2024. Abstract Kernel-based tests provide a simple yet effective framework that uses the theory of reproducing kernel Hilbert spaces to design non-parametric testing procedures . In this paper , we propose new theoretical tools that can be used to study the asymptotic behaviour of kernel-based tests in various data scenarios and in different testing problems . Unlike current approaches , our methods avoid working with U and V-statistics expansions that usually lead to lengthy and tedious

Variational Estimators of the Degree-corrected Latent Block Model for Bipartite Networks

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Variational Estimators of the Degree-corrected Latent Block Model for Bipartite Networks Yunpeng Zhao , Ning Hao , Ji Zhu 25(150 1 42, 2024. Abstract Bipartite graphs are ubiquitous across various scientific and engineering fields . Simultaneously grouping the two types of nodes in a bipartite graph via biclustering represents a fundamental challenge in network analysis for such graphs . The latent block model LBM is a commonly used model-based tool for biclustering . However , the effectiveness of the LBM is often limited by the influence of row and column sums in the data matrix . To address this

PyGOD: A Python Library for Graph Outlier Detection

Updated: 2024-06-28 16:36:08

: Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us PyGOD : A Python Library for Graph Outlier Detection Kay Liu , Yingtong Dou , Xueying Ding , Xiyang Hu , Ruitong Zhang , Hao Peng , Lichao Sun , Philip S . Yu 25(141 1 9, 2024. Abstract PyGOD is an open-source Python library for detecting outliers in graph data . As the first comprehensive library of its kind , PyGOD supports a wide array of leading graph-based methods for outlier detection under an easy-to-use , well-documented API designed for use by both researchers and practitioners . PyGOD provides modularized components of the different detectors implemented so that users can easily customize

Nonparametric Regression Using Over-parameterized Shallow ReLU Neural Networks

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Nonparametric Regression Using Over-parameterized Shallow ReLU Neural Networks Yunfei Yang , Ding-Xuan Zhou 25(165 1 35, 2024. Abstract It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence up to logarithmic factors for learning functions from certain smooth function classes , if the weights are suitably constrained or regularized . Specifically , we consider the nonparametric regression of estimating an unknown d$-variate function by using shallow ReLU neural networks . It is assumed that the regression function is from the H older space with smoothness

Random Forest Weighted Local Fr{{\'e}}chet Regression with Random Objects

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Random Forest Weighted Local Fréchet Regression with Random Objects Rui Qiu , Zhou Yu , Ruoqing Zhu 25(107 1 69, 2024. Abstract Statistical analysis is increasingly confronted with complex data from metric spaces . Petersen and Müller 2019 established a general paradigm of Fréchet regression with complex metric space valued responses and Euclidean predictors . However , the local approach therein involves nonparametric kernel smoothing and suffers from the curse of dimensionality . To address this issue , we in this paper propose a novel random forest weighted local Fréchet regression paradigm . The

Linear Distance Metric Learning with Noisy Labels

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Linear Distance Metric Learning with Noisy Labels Meysam Alishahi , Anna Little , Jeff M . Phillips 25(121 1 53, 2024. Abstract In linear distance metric learning , we are given data in one Euclidean metric space and the goal is to find an appropriate linear map to another Euclidean metric space which respects certain distance conditions as much as possible . In this paper , we formalize a simple and elegant method which reduces to a general continuous convex loss optimization problem , and for different noise models we derive the corresponding loss functions . We show that even if the data is noisy

Statistical Inference for Fairness Auditing

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Statistical Inference for Fairness Auditing John J . Cherian , Emmanuel J . Candès 25(149 1 49, 2024. Abstract Before deploying a black-box model in high-stakes problems , it is important to evaluate the model’s performance on sensitive subpopulations . For example , in a recidivism prediction task , we may wish to identify demographic groups for which our prediction model has unacceptably high false positive rates or certify that no such groups exist . In this paper , we frame this task , often referred to as fairness auditing , in terms of multiple hypothesis testing . We show how the bootstrap can

Sparse Representer Theorems for Learning in Reproducing Kernel Banach Spaces

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Sparse Representer Theorems for Learning in Reproducing Kernel Banach Spaces Rui Wang , Yuesheng Xu , Mingsong Yan 25(93 1 45, 2024. Abstract Sparsity of a learning solution is a desirable feature in machine learning . Certain reproducing kernel Banach spaces RKBSs are appropriate hypothesis spaces for sparse learning methods . The goal of this paper is to understand what kind of RKBSs can promote sparsity for learning solutions . We consider two typical learning models in an RKBS : the minimum norm interpolation MNI problem and the regularization problem . We first establish an explicit representer

Representation Learning via Manifold Flattening and Reconstruction

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Representation Learning via Manifold Flattening and Reconstruction Michael Psenka , Druv Pai , Vishal Raman , Shankar Sastry , Yi Ma 25(132 1 47, 2024. Abstract A common assumption for real-world , learnable data is its possession of some low-dimensional structure , and one way to formalize this structure is through the manifold hypothesis : that learnable data lies near some low-dimensional manifold . Deep learning architectures often have a compressive autoencoder component , where data is mapped to a lower-dimensional latent space , but often many architecture design choices are done by hand ,

Unsupervised Anomaly Detection Algorithms on Real-world Data: How Many Do We Need?

Updated: 2024-06-28 16:36:08

: Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Unsupervised Anomaly Detection Algorithms on Real-world Data : How Many Do We Need Roel Bouman , Zaharah Bukhsh , Tom Heskes 25(105 1 34, 2024. Abstract In this study we evaluate 33 unsupervised anomaly detection algorithms on 52 real-world multivariate tabular data sets , performing the largest comparison of unsupervised anomaly detection algorithms to date . On this collection of data sets , the EIF Extended Isolation Forest algorithm significantly outperforms the most other algorithms . Visualizing and then clustering the relative performance of the considered algorithms on all data sets , we

OpenBox: A Python Toolkit for Generalized Black-box Optimization

Updated: 2024-06-28 16:36:08

Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, experimental design, and database knob tuning. However, users still face challenges when applying BBO methods to their problems at hand with existing software packages in terms of applicability, performance, and efficiency. This paper presents OpenBox, an open-source BBO toolkit with improved usability. It implements user-friendly interfaces and visualization for users to define and manage their tasks. The modular design behind OpenBox facilitates its flexible deployment in existing systems. Experimental results demonstrate the effectiveness and efficiency of OpenBox over existing systems. The source code of OpenBox is available at https://github.com/PKU-DAIR/open-box.

Bagging Provides Assumption-free Stability

Updated: 2024-06-28 16:36:08

Bagging is an important technique for stabilizing machine learning models. In this paper, we derive a finite-sample guarantee on the stability of bagging for any model. Our result places no assumptions on the distribution of the data, on the properties of the base algorithm, or on the dimensionality of the covariates. Our guarantee applies to many variants of bagging and is optimal up to a constant. Empirical results validate our findings, showing that bagging successfully stabilizes even highly unstable base algorithms.

Nonparametric Copula Models for Multivariate, Mixed, and Missing Data

Updated: 2024-06-28 16:36:08

, , Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Nonparametric Copula Models for Multivariate , Mixed , and Missing Data Joseph Feldman , Daniel R . Kowal 25(164 1 50, 2024. Abstract Modern data sets commonly feature both substantial missingness and many variables of mixed data types , which present significant challenges for estimation and inference . Complete case analysis , which proceeds using only the observations with fully-observed variables , is often severely biased , while model-based imputation of missing values is limited by the ability of the model to capture complex dependencies among possibly many variables of mixed data types .

Generative Adversarial Ranking Nets

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Generative Adversarial Ranking Nets Yinghua Yao , Yuangang Pan , Jing Li , Ivor W . Tsang , Xin Yao 25(119 1 35, 2024. Abstract We propose a new adversarial training framework generative adversarial ranking networks GARNet to learn from user preferences among a list of samples so as to generate data meeting user-specific criteria . Verbosely , GARNet consists of two modules : a ranker and a generator . The generator fools the ranker to raise generated samples to the top while the ranker learns to rank generated samples at the bottom . Meanwhile , the ranker learns to rank samples regarding the

Adjusted Wasserstein Distributionally Robust Estimator in Statistical Learning

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Adjusted Wasserstein Distributionally Robust Estimator in Statistical Learning Yiling Xie , Xiaoming Huo 25(148 1 40, 2024. Abstract We propose an adjusted Wasserstein distributionally robust estimator---based on a nonlinear transformation of the Wasserstein distributionally robust WDRO estimator in statistical learning . The classic WDRO estimator is asymptotically biased , while our adjusted WDRO estimator is asymptotically unbiased , resulting in a smaller asymptotic mean squared error . Further , under certain conditions , our proposed adjustment technique provides a general principle to de-bias

Exploration of the Search Space of Gaussian Graphical Models for Paired Data

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Exploration of the Search Space of Gaussian Graphical Models for Paired Data Alberto Roverato , Dung Ngoc Nguyen 25(92 1 41, 2024. Abstract We consider the problem of learning a Gaussian graphical model in the case where the observations come from two dependent groups sharing the same variables . We focus on a family of coloured Gaussian graphical models specifically suited for the paired data problem . Commonly , graphical models are ordered by the submodel relationship so that the search space is a lattice , called the model inclusion lattice . We introduce a novel order between models , named the

Predictive Inference with Weak Supervision

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Predictive Inference with Weak Supervision Maxime Cauchois , Suyash Gupta , Alnur Ali , John C . Duchi 25(118 1 45, 2024. Abstract The expense of acquiring labels in large-scale statistical machine learning makes partially and weakly-labeled data attractive , though it is not always apparent how to leverage such data for model fitting or validation . We present a methodology to bridge the gap between partial supervision and validation , developing a conformal prediction framework to provide valid predictive confidence sets---sets that cover a true label with a prescribed probability , independent of

Fixed points of nonnegative neural networks

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Fixed points of nonnegative neural networks Tomasz J . Piotrowski , Renato L . G . Cavalcante , Mateusz Gabor 25(139 1 40, 2024. Abstract We use fixed point theory to analyze nonnegative neural networks , which we define as neural networks that map nonnegative vectors to nonnegative vectors . We first show that nonnegative neural networks with nonnegative weights and biases can be recognized as monotonic and weakly scalable mappings within the framework of nonlinear Perron-Frobenius theory . This fact enables us to provide conditions for the existence of fixed points of nonnegative neural networks

An Analysis of Quantile Temporal-Difference Learning

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us An Analysis of Quantile Temporal-Difference Learning Mark Rowland , Rémi Munos , Mohammad Gheshlaghi Azar , Yunhao Tang , Georg Ostrovski , Anna Harutyunyan , Karl Tuyls , Marc G . Bellemare , Will Dabney 25(163 1 47, 2024. Abstract We analyse quantile temporal-difference learning QTD a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning . Despite these empirical successes , a theoretical understanding of QTD has proven elusive until now . Unlike classical TD learning , which can be analysed

Multi-class Probabilistic Bounds for Majority Vote Classifiers with Partially Labeled Data

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Multi-class Probabilistic Bounds for Majority Vote Classifiers with Partially Labeled Data Vasilii Feofanov , Emilie Devijver , Massih-Reza Amini 25(104 1 47, 2024. Abstract In this paper , we propose a probabilistic framework for analyzing a multi-class majority vote classifier in the case where training data is partially labeled . First , we derive a multi-class transductive bound over the risk of the majority vote classifier , which is based on the classifier's vote distribution over each class . Then , we introduce a mislabeling error model to analyze the error of the majority vote classifier in

Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression Jiading Liu , Lei Shi 25(155 1 56, 2024. Abstract Previous analysis of regularized functional linear regression in a reproducing kernel Hilbert space RKHS typically requires the target function to be contained in this kernel space . This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not necessarily reside in the underlying RKHS . As a decomposition-based scalable approach , the divide-and-conquer estimators of functional linear regression

Differentially Private Data Release for Mixed-type Data via Latent Factor Models

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Differentially Private Data Release for Mixed-type Data via Latent Factor Models Yanqing Zhang , Qi Xu , Niansheng Tang , Annie Qu 25(116 1 37, 2024. Abstract Differential privacy is a particular data privacy-preserving technology which enables synthetic data or statistical analysis results to be released with a minimum disclosure of private information from individual records . The tradeoff between privacy-preserving and utility guarantee is always a challenge for differential privacy technology , especially for synthetic data generation . In this paper , we propose a differentially private data

DoWhy-GCM: An Extension of DoWhy for Causal Inference in Graphical Causal Models

Updated: 2024-06-28 16:36:08

We present DoWhy-GCM, an extension of the DoWhy Python library, which leverages graphical causal models. Unlike existing causality libraries, which mainly focus on effect estimation, DoWhy-GCM addresses diverse causal queries, such as identifying the root causes of outliers and distributional changes, attributing causal influences to the data generating process of each node, or diagnosis of causal structures. With DoWhy-GCM, users typically specify cause-effect relations via a causal graph, fit causal mechanisms, and pose causal queries---all with just a few lines of code. The general documentation is available at https://www.pywhy.org/dowhy and the DoWhy-GCM specific code at https://github.com/py-why/dowhy/tree/main/dowhy/gcm.

Conformal Inference for Online Prediction with Arbitrary Distribution Shifts

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Conformal Inference for Online Prediction with Arbitrary Distribution Shifts Isaac Gibbs , Emmanuel J . Candès 25(162 1 36, 2024. Abstract We consider the problem of forming prediction sets in an online setting where the distribution generating the data is allowed to vary over time . Previous approaches to this problem suffer from over-weighting historical data and thus may fail to quickly react to the underlying dynamics . Here , we correct this issue and develop a novel procedure with provably small regret over all local time intervals of a given width . We achieve this by modifying the adaptive

The Non-Overlapping Statistical Approximation to Overlapping Group Lasso

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us The Non-Overlapping Statistical Approximation to Overlapping Group Lasso Mingyu Qi , Tianxi Li 25(115 1 70, 2024. Abstract The group lasso penalty is widely used to introduce structured sparsity in statistical learning , characterized by its ability to eliminate predefined groups of parameters automatically . However , when the groups overlap , solving the group lasso problem can be time-consuming in high-dimensional settings due to groups’ non-separability . This computational challenge has limited the applicability of the overlapping group lasso penalty in cutting-edge areas , such as gene pathway

Functional Directed Acyclic Graphs

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Functional Directed Acyclic Graphs Kuang-Yao Lee , Lexin Li , Bing Li 25(78 1 48, 2024. Abstract In this article , we introduce a new method to estimate a directed acyclic graph DAG from multivariate functional data . We build on the notion of faithfulness that relates a DAG with a set of conditional independences among the random functions . We develop two linear operators , the conditional covariance operator and the partial correlation operator , to characterize and evaluate the conditional independence . Based on these operators , we adapt and extend the PC-algorithm to estimate the functional

Information Processing Equalities and the Information–Risk Bridge

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Information Processing Equalities and the Information–Risk Bridge Robert C . Williamson , Zac Cranko 25(103 1 53, 2024. Abstract We introduce two new classes of measures of information for statistical experiments which generalise and subsume φ-divergences , integral probability metrics , N-distances MMD and f,Γ divergences between two or more distributions . This enables us to derive a simple geometrical relationship between measures of information and the Bayes risk of a statistical decision problem , thus extending the variational φ-divergence representation to multiple distributions in an entirely

Unlabeled Principal Component Analysis and Matrix Completion

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Unlabeled Principal Component Analysis and Matrix Completion Yunzhen Yao , Liangzu Peng , Manolis C . Tsakiris 25(77 1 38, 2024. Abstract We introduce robust principal component analysis from a data matrix in which the entries of its columns have been corrupted by permutations , termed Unlabeled Principal Component Analysis UPCA Using algebraic geometry , we establish that UPCA is a well-defined algebraic problem since we prove that the only matrices of minimal rank that agree with the given data are row-permutations of the ground-truth matrix , arising as the unique solutions of a polynomial system

Nonparametric Regression for 3D Point Cloud Learning

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Nonparametric Regression for 3D Point Cloud Learning Xinyi Li , Shan Yu , Yueying Wang , Guannan Wang , Li Wang , Ming-Jun Lai 25(102 1 56, 2024. Abstract In recent years , there has been an exponentially increased amount of point clouds collected with irregular shapes in various areas . Motivated by the importance of solid modeling for point clouds , we develop a novel and efficient smoothing tool based on multivariate splines over the triangulation to extract the underlying signal and build up a 3D solid model from the point cloud . The proposed method can denoise or deblur the point cloud

Flexible Bayesian Product Mixture Models for Vector Autoregressions

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Flexible Bayesian Product Mixture Models for Vector Autoregressions Suprateek Kundu , Joshua Lukemire 25(146 1 52, 2024. Abstract Bayesian non-parametric methods based on Dirichlet process mixtures have seen tremendous success in various domains and are appealing in being able to borrow information by clustering samples that share identical parameters . However , such methods can face hurdles in heterogeneous settings where objects are expected to cluster only along a subset of axes or where clusters of samples share only a subset of identical parameters . We overcome such limitations by developing a

More Efficient Estimation of Multivariate Additive Models Based on Tensor Decomposition and Penalization

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us More Efficient Estimation of Multivariate Additive Models Based on Tensor Decomposition and Penalization Xu Liu , Heng Lian , Jian Huang 25(161 1 27, 2024. Abstract We consider parsimonious modeling of high-dimensional multivariate additive models using regression splines , with or without sparsity assumptions . The approach is based on treating the coefficients in the spline expansions as a third-order tensor . Note the data does not have tensor predictors or tensor responses , which distinguishes our study from the existing ones . A Tucker decomposition is used to reduce the number of parameters in

AMLB: an AutoML Benchmark

Updated: 2024-06-28 16:36:08

: Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us AMLB : an AutoML Benchmark Pieter Gijsbers , Marcos L . P . Bueno , Stefan Coors , Erin LeDell , Sébastien Poirier , Janek Thomas , Bernd Bischl , Joaquin Vanschoren 25(101 1 65, 2024. Abstract Comparing different AutoML frameworks is notoriously challenging and often done incorrectly . We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks . We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks . The differences between the AutoML frameworks are explored with

Spatial meshing for general Bayesian multivariate models

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Spatial meshing for general Bayesian multivariate models Michele Peruzzi , David B . Dunson 25(87 1 49, 2024. Abstract Quantifying spatial and or temporal associations in multivariate geolocated data of different types is achievable via spatial random effects in a Bayesian hierarchical model , but severe computational bottlenecks arise when spatial dependence is encoded as a latent Gaussian process GP in the increasingly common large scale data settings on which we focus . The scenario worsens in non-Gaussian models because the reduced analytical tractability leads to additional hurdles to

Semi-supervised Inference for Block-wise Missing Data without Imputation

Updated: 2024-06-28 16:36:08

We consider statistical inference for single or low-dimensional parameters in a high-dimensional linear model under a semi-supervised setting, wherein the data are a combination of a labelled block-wise missing data set of a relatively small size and a large unlabelled data set. The proposed method utilises both labelled and unlabelled data without any imputation or removal of the missing observations. The asymptotic properties of the estimator are established under regularity conditions. Hypothesis testing for low-dimensional coefficients are also studied. Extensive simulations are conducted to examine the theoretical results. The method is evaluated on the Alzheimer’s Disease Neuroimaging Initiative data.

Transport-based Counterfactual Models

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Transport-based Counterfactual Models Lucas De Lara , Alberto González-Sanz , Nicholas Asher , Laurent Risser , Jean-Michel Loubes 25(136 1 59, 2024. Abstract Counterfactual frameworks have grown popular in machine learning for both explaining algorithmic decisions but also defining individual notions of fairness , more intuitive than typical group fairness conditions . However , state-of-the-art models to compute counterfactuals are either unrealistic or unfeasible . In particular , while Pearl's causal inference provides appealing rules to calculate counterfactuals , it relies on a model that is

Spectral learning of multivariate extremes

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Spectral learning of multivariate extremes Marco Avella Medina , Richard A Davis , Gennady Samorodnitsky 25(124 1 36, 2024. Abstract We propose a spectral clustering algorithm for analyzing the dependence structure of multivariate extremes . More specifically , we focus on the asymptotic dependence of multivariate extremes characterized by the angular or spectral measure in extreme value theory . Our work studies the theoretical performance of spectral clustering based on a random k$-nearest neighbor graph constructed from an extremal sample , i.e . the angular part of random vectors for which the

Fat-Shattering Dimension of k-fold Aggregations

Updated: 2024-06-28 16:36:08

We provide estimates on the fat-shattering dimension of aggregation rules of real-valued function classes. The latter consists of all ways of choosing k functions, one from each of the k classes, and computing pointwise an "aggregate" function of these, such as the median, mean, and maximum. The bounds are stated in terms of the fat-shattering dimensions of the component classes. For linear and affine function classes, we provide a considerably sharper upper bound and a matching lower bound, achieving, in particular, an optimal dependence on k. Along the way, we improve several known results in addition to pointing out and correcting a number of erroneous claims in the literature.

A Semi-parametric Estimation of Personalized Dose-response Function Using Instrumental Variables

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us A Semi-parametric Estimation of Personalized Dose-response Function Using Instrumental Variables Wei Luo , Yeying Zhu , Xuekui Zhang , Lin Lin 25(86 1 38, 2024. Abstract In the application of instrumental variable analysis that conducts causal inference in the presence of unmeasured confounding , invalid instrumental variables and weak instrumental variables often exist which complicate the analysis . In this paper , we propose a model-free dimension reduction procedure to select the invalid instrumental variables and refine them into lower-dimensional linear combinations . The procedure also

Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits Junpei Komiyama , Edouard Fouché , Junya Honda 25(112 1 56, 2024. Abstract We consider nonstationary multi-armed bandit problems where the model parameters of the arms change over time . We introduce the adaptive resetting bandit ADR-bandit a bandit algorithm class that leverages adaptive windowing techniques from literature on data streams . We first provide new guarantees on the quality of estimators resulting from adaptive windowing techniques , which are of independent interest . Furthermore , we conduct a finite-time analysis of

Sum-of-norms clustering does not separate nearby balls

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Sum-of-norms clustering does not separate nearby balls Alexander Dunlap , Jean-Christophe Mourrat 25(123 1 40, 2024. Abstract Sum-of-norms clustering is a popular convexification of K$-means clustering . We show that , if the dataset is made of a large number of independent random variables distributed according to the uniform measure on the union of two disjoint balls of unit radius , and if the balls are sufficiently close to one another , then sum-of-norms clustering will typically fail to recover the decomposition of the dataset into two clusters . As the dimension tends to infinity , this

Adaptive Latent Feature Sharing for Piecewise Linear Dimensionality Reduction

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Adaptive Latent Feature Sharing for Piecewise Linear Dimensionality Reduction Adam Farooq , Yordan P . Raykov , Petar Raykov , Max A . Little 25(135 1 42, 2024. Abstract Linear Gaussian exploratory tools such as principal component analysis PCA and factor analysis FA are widely used for exploratory analysis , pre-processing , data visualization , and related tasks . Because the linear-Gaussian assumption is restrictive , for very high dimensional problems , they have been replaced by robust , sparse extensions or more flexible discrete-continuous latent feature models . Discrete-continuous latent

Learning Non-Gaussian Graphical Models via Hessian Scores and Triangular Transport

Updated: 2024-06-28 16:36:08

Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Learning Non-Gaussian Graphical Models via Hessian Scores and Triangular Transport Ricardo Baptista , Rebecca Morrison , Olivier Zahm , Youssef Marzouk 25(85 1 46, 2024. Abstract Undirected probabilistic graphical models represent the conditional dependencies , or Markov properties , of a collection of random variables . Knowing the sparsity of such a graphical model is valuable for modeling multivariate distributions and for efficiently performing inference . While the problem of learning graph structure from data has been studied extensively for certain parametric families of distributions , most

✚ Visualization Tools and Learning Resources, June 2024 Roundup

Updated: 2024-06-27 18:30:33

, Book Membership Projects Learn Newsletter Become a Member Log in Members Only Visualization Tools and Learning Resources , June 2024 Roundup June 27, 2024 Topic The Process roundup Welcome to The Process the newsletter for FlowingData members that looks closer at how the charts get made . I’m Nathan Yau . Every month I collect tools and resources to help make better charts . Here is the good stuff for June 2024. To access this issue of The Process , you must be a . member If you are already a member , log in here See What You Get The Process is a weekly newsletter on how visualization tools , rules , and guidelines work in practice . I publish every Thursday . Get it in your inbox or read it on FlowingData . You also gain unlimited access to hundreds of hours worth of step-by-step

Pivot Table vs. Decomposition Tree: Advancing Data Insights

Updated: 2024-06-27 08:53:13

In the evolving data landscape, the pivot table has long been a cornerstone of the data analyst’s toolkit. Traditionally used to summarize datasets in a condensed tabular form, pivot tables facilitate quick overviews and basic drill-down capabilities. However, as data sets grow larger and more complex, the static and manual nature of pivot tables can […] The post Pivot Table vs. Decomposition Tree: Advancing Data Insights appeared first on AnyChart News.

AI and data journalism: the AP’s Garance Burke

Updated: 2024-06-26 17:13:54

Garance Burke⁠ is a global investigative reporter for the Associated Press, with a focus on reporting around Artificial Intelligence. She wrote the chapter of the AP style guide around reporting on AI and leads a team which works with data to tell stories every day. She joins Simon and Alberto to discuss the implications for … Continue reading →

Engaging Data Visualizations to Explore — DataViz Weekly

Updated: 2024-06-21 23:59:39

Welcome to another edition of DataViz Weekly, where we bring you some of the most interesting data visualizations we’ve recently come across. This week, we’re highlighting four fresh projects that effectively use data visualization to provide valuable insights: Tracking heat across the United States — The New York Times EU gas insights — Strategic Perspectives […] The post Engaging Data Visualizations to Explore — DataViz Weekly appeared first on AnyChart News.

✚ Games to Explore Data and Possibilities

Updated: 2024-06-20 18:30:09

Book Membership Projects Learn Newsletter Become a Member Log in Members Only Games to Explore Data and Possibilities June 20, 2024 Topic The Process games uncertainty Visualization and analysis is usually about minimizing uncertainty to more clearly see patterns . On the other hand , games force you to play through uncertainty to get from point A to point . B To access this issue of The Process , you must be a . member If you are already a member , log in here See What You Get The Process is a weekly newsletter on how visualization tools , rules , and guidelines work in practice . I publish every Thursday . Get it in your inbox or read it on FlowingData . You also gain unlimited access to hundreds of hours worth of step-by-step visualization courses and tutorials which will help you make

Revealing Insights with Data Visualizations — DataViz Weekly

Updated: 2024-06-14 00:56:05

Data visualizations bridge the gap between raw numbers and clear, understandable insights. This week on DataViz Weekly, we showcase four remarkable new examples of how charts and maps illuminate diverse topics in a comprehensible and engaging manner: In-flight turbulence — South China Morning Post Shifts in occupation and income — FlowingData San Francisco’s culinary diversity […] The post Revealing Insights with Data Visualizations — DataViz Weekly appeared first on AnyChart News.

Data Visualization

Exploring ways to display data

Current Feed Items | Previous Months ItemsMay 2024 | Apr 2024 | Mar 2024 | Feb 2024 | Jan 2024 | Dec 2023

Current Feed Items | Previous Months Items

Get Feed

Sources

50 - JMLR

3 - AnyChart News

2 - FlowingData

1 - Simon Rogers

Current Feed Items | Previous Months Items
May 2024 | Apr 2024 | Mar 2024 | Feb 2024 | Jan 2024 | Dec 2023